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Abstract. In order to analyze and extract different structural properties of distribu- 
tions, one can introduce different coordinate systems over the manifold of distributions. In 
Evolutionary Computation, the Walsh bases and the Building Block Bases are often used 
to describe populations, which simplifies the analysis of evolutionary operators applying on 
populations. Quite independent from these approaches, information geometry has been de- 
veloped as a geometric way to analyze different order dependencies between random variables 
(e.g., neural activations or genes). 

In these notes I briefly review the essentials of various coordinate bases and of information 
geometry. The goal is to give an overview and make the approaches comparable. Besides 
introducing meaningful coordinate bases, information geometry also offers an explicit way to 
distinguish different order interactions and it offers a geometric view on the manifold and 
thereby also on operators that apply on the manifold. For instance, uniform crossover can be 
interpreted as an orthogonal projection of a population along an m-geodesic, monotonously 
reducing the ^-coordinates that describe interactions between genes. 



1 Introduction 

Evolution can be understood as a process on the space A of distributions over the search O. 
Essentially, a parent population can be captured as a (finite) distribution p £ A. Mutation 
and recombination operators (MC) applied on the parent population specify a search (off- 
spring) distribution q e A. And a (stochastic) selection operator (§'' JS"^) maps g to a new 
parent population p' . In this view, evolution can be understood as a process 

Me s^'a^s" / Me , s^'^s" „ Me 
pi — > q ' — > p I — > q I — > p I — > ■■■ 

We do not need to go into the details of the indicated recombination, mutation, and selection 
operators here. Instead, we would like to emphasize an information theoretic point of view 
on this process. Typically, the mapping p > g (which one could also call search heuristic) 
from the parent population to the search distribution adds entropy whereas selection q i— > 
p' reduces entropy. Another interesting observable in this process is the structure of the 
distributions — by which we mean the mutual information present in these distributions. For 
instance, one can show that ordinary mutation and crossover operators (on a direct genetic 
representation) generally reduce mutual information, i.e., destroy structural content that 
might have been present in p after selection (Toussaint 2004) . 



The analysis of the structure of distributions is an important topic in various areas. In 
evolutionary computation, the Walsh spectrum is a prominent way to analyze the structure 
of p, often with the aim to transport it to q. The Walsh coefficients may also be considered 
as a way of describing epistasis. In complex systems, certain mutual information measures 
are often used to define the structuredness (in their terms: complexity) of dynamics systems 
(Langton 1990; Sporns & Tononi 2002). 

In these notes, I want to briefly review the information geometric way to describe the struc- 
ture of a distribution (Amari 1999; Amari 2001) and relate it to the field of evolutionary 
computation. The first step is simply to present the coordinates introduced by Walsh co- 
efficients side-by-side with those used in information geometry to make them comparable. 
This gives an intuition about the "bases" over which distributions can be analyzed and 
reveals, for instance, that the so-called Building-Block-Basis (Chryssomalakos & Stephens 
2004), as introduced in Evolutionary Computation, is the same as Amari's r^-basis. Maybe 
Amari's 0-bases is most interesting in its capabilities to precisely capture fcth-order mutual 
dependencies. It offers a notion of the "order-spectrum of mutual information" alternative 
to the Walsh spectrum. Eventually, Amari's formalism allows to completely decompose any 
distribution into its different fcth-oder components. 

Finally, the geometry introduced over the space of distributions by Amari gives very in- 
sightful interpretations of distances between distributions. A Pythagoras theorem can be 
formulated for the KuUback-Leibler divergence. Under some conditions, minimizations of 
the KuUback-Leibler divergence can often be interpreted as orthogonal projections. This 
offers a geometric view on some evolutionary operators. 



Distributions, log-probabilities, and hypercube bases The most direct "coordinate 
system" that can be introduced on the manifold of distributions is given by the probabilities 
p{x) for all X £ itself. To preserve notational uniformity with other coordinate systems we 
write these numbers as px p{x), which means that p^ is the a;-th component of p G A in 
the direct basis. Because of the normalization constraint '^.j.Px = 1, these are only |f2| — 1 
independent coordinates. 

Clearly, instead of using as coordinates, one can also use their log's Ix ■= — logpx- Taking 
the log of probabilities is, very roughly spoken, related to changing to entropic units. (Note 
the definition of the entropy of p as H{p) = —J^xP^^'^^P^ " Ep{lx}-) Thus, coordinates 
that have some "entropic meaning" (i.e., are related to information theoretic measures like 
entropy, mutual information, or KuUback-Leibler divergence) will be based on these log 
quantities. Namely, this will be the ^-coordinate system introduced by Amari (see Amari 
1999; Amari 2001). 

In the following we will speck of bases of coordinate systems. Essentially, what we mean are 
basis functions, similar to the sine and cosine in the Fourier transform. For illustration, we 
will always think of CI as the hypercube; the basis function then correspond to "colorings" of 
the hypercube with function values (mostly 1, 0, or —1). E.g., if e; : 17 = {0, 1}'^ {1, 0, —1} 
is the z-th basis function, then the i-th coordinate of a distributions p in this coordinate 
system is the convolution of p with ef. pi = {ei,p) := '^x&n^^i^) P(^)- illustrate such 
basis functions by 3D-hypercubes, i i , where the bullet corresponds to 1, the circle to —1 



and empty vertices to 0. 

The basis of direct coordinate system is the i5-basis: the set of all hypercubes where only 
one vertex is 1 and all others are 0. 



2 Notations 
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Marginals over fc-tuples of variables and schemata In the following, we will also 
need a compact notation for the different marginals of a distribution. Let Q he a product 
space = X • ■ • X 0" such that we can define the marginals of a distributions p over 
single variables but also pairs, triples, and fc-tuples of variables. We use indices i,j, .. £ I = 
{1, to indicate variables and write the marginals as p'-' 

p''-'"(a,b, ..) = Pr{a;,; = a, Xj = 6, ..} . 

The set of all possible marginals is given by considering all single indices i, all pairs i < j, all 
triples i < j < k, etc. To simplify notation (e.g., summation over such objects), we collect 
all these tuples of indices in a set 

A^I U \ i<jel} U {(i,j,fc) \ i<j<kel} U ••• U {(l,2,..,n)} 

= {!,.., n, (l,2),(l,3)..,(l,n),(2,3),(2,4)...,(n~l,n), (1,2,3),.., (1, 2, 3, .., n)} . 

In that way, all marginals of p are given as for a G A. Note that |^| = |0| — 1. 

Besides using a £ A to indicate a marginal, one can equivalently use the schemata notation 
of length-n strings in {*, d}": For a given a, the corresponding schema is the string of all *'s 
except for those positions indicated in the tuple a. E.g., for n = 6: 

^245 _ *d*dd* 



3 Walsh, T]-, 9-, Building Block, and Haar bases 



Table 1 captures the basics of the Walsh, 77-, 6-, and Haar bases. In all cases, the coordinate 
system is defined by the basis functions depicted for the 3D-case as hypercubes. Actually, 
these 3D illustrations of the basis functions Ci are already sufficient to infer the basis func- 
tions for all n since they are constructed in a very systematic way — which seems obvious by 
simply looking at them and becomes rigorous by considering the transformation matrices 
into these coordinates systems: 

The transformation matrices map linearly (mod 2) from the direct coordinates Px to the 
new coordinates. E.g., in the Walsh case, Wy — J^x^v^P^- "^^"^ rows in these matrices 
correspond to the basis functions Cy = Wy.. An important property is that in all cases 
(except the Haar bases!), the transformation matrices can be constructed by repeated tensor 
products of a 2D matrix. For instance, for n = 2 in the Walsh case: 



W 



n=2 



V 1 -1 - 



1\ 



-1 
-1 



1 / 



1 1 

1 -1 

iglTl 



1 1 

1 -1 



1 

-1 



,82 



Here, we introduced the superscript notation ®" to indicate the n-fold tensor product. 

Table 1 summarizes the most important properties of these transformation matrices: their 
closed form expression, their tensor product construction, and their inverse. When look- 
ing at the table one should first observe the self-similar regularity of the transformation 
matrices, which stems from their definition of repeated tensor products. The meaning of 
the various bases become more intuitive when looking at the hypercube illustrations of the 
basis. The Walsh bases, e.g., can nicely be compared to a Fourier basis: cqoo corresponds 
to the constant function 1, cqoi, eoio, eioo could be view as sinus functions along the x-, y-, 
and z-axes, respectively; eon, eioi, eno are products of sinus functions — and capture 2nd 
order dependencies; and em is the "highest frequency" bases function capturing 3rd order 
dependencies. 
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Walsh 



P^ = -kllv WxyWy 
Wyx = (-l)I^ANDj/| 



w- 




000 
001 

010 

oil 

100 
101 
110 

111 



Oi — ^Oi — ^Oi — ^Oi — I 
OOl — ^1 — ^OO' — \ ^ — I 
OOOO' — \ ^ — \ ^ — \ ^ — I 



■ o o ■ ■ o o 
o o ■ ■ o o ■ 

■ ■ ■ o o o o 
o ■ o o ■ o ■ 

■ o o o o ■ ■ 
o o ■ o ■ ■ o 



:4 

00 T — ^ 



O 100 • o 010 o o 001 



oil • ^ 101 

o' 1 



111 



110 



^^H S i^H O i— i S i— T" 
OOl— It— lOO"— I"— I 
OOOO"— It— It— It— I 



Amari's r/ / BBB 
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Amari's 6 
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Table 1: Overview over the different bases for the space of distributions. The first column 
gives the definitions of the transformations and their inverse. Note that the 0-bases is defined 
in log-space. The transformation matrices are illustrated in the section column for n = 3 
using the symbols ■ = 1, o = —1, and • = 0. The third column illustrates the bases functions 
By (or Ca) as colorings of the hypercube {0, l}"^. Note that the basis functions correspond to 
rows of the transformation matrix. The 1-norm |a;AND y\ of the AND of two binary strings 
counts the 1-bits that they have in common. 
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The 77-bases captures certain marginals relative to the all-Is string: 

These can be thought of the marginals over all possible Building-Blocks — thus it is also called 
the Building-Block-Bases (BBB, cf. Chryssomalakos & Stephens 2004). This marginalization 
becomes apparent in the hypercube colorings as the abundance of zeros (non-colored vertices 
and dots in the matrix). 

The ^?-bases combines the "frequency" idea of the Walsh bases with the marginalization: The 
highest order bases function 6123 is analogous to the Walsh bases em and detects highest 
order dependencies. Lower order dependencies though are only detected on a marginal. 

However, note that the 9 bases is defined in log-space, 6a ~ J2x ^ax^ogpx- We will find 
some implications of this in the next section. Note that the transformation matrices of the 
77- (Building-Block-) and the 6'-bases are related via B = {B~^)'^ . 

For completeness, we also indicated the Haar bases in table 1. It can not be derived as 
repeated tensor products and we do not discuss it any further here. One argument made 
about the Haar bases (Khuri 1994) is that the transformation matrix incorporates a lot of 
Os. Thus, the coefficients arc more efficient to compute as the Walsh coefficients. We add 
here that the ratio of zeros in the 77 and 9 transformation matrices is 1 — (3/4)"^^ and 
approaches 1 exponentially with the dimension n. 

4 Mathematical structure on the manifold A 

In this section we want to develop a more geometric view on the manifold of distributions, 
following (Amari 1999; Amari 2001). This geometry will put a special emphasis on the 77- 
and 61-bascs. 

m- and e-geodesics An essential ingredient to describe the geometry of a manifold is 
the definition of the notion of "straight lines", or geodesies, connecting two points in the 
manifold. In the case of the manifold of distributions, there exist at least two ways of defining 
a straight path connecting two distributions q and r: the one being the linear mixture in 
direct coordinates Px, the other being the linear mixture in log coordinates Ix, 

771-geodesic: p{x) ~ {l — a)q{x) +ar{x) , 
e-geodesic: \ogp{x) = (1 — a) logq{x) + a logr(x) — ip{x) . 

Here m means mixture and e means exponential. The additional term ip{x) in the e-geodesic 
is necessary to preserve the normalization of p{x). 

The fact that there exist two ways of defining geodesies means that there exist two mean- 
ingful ajfine connections on the manifold. Both define a notion of flatness: we say that a 
771-geodesic is 771-flat and a e-geodesic is e-fiat. 

It turns out that the coordinate lines (and planes, hyperplanes, etc.) of 77 are 777-flat and 
those of 9 are e-flat. The former is obvious, since an 777,-geodesic can equivalently be written 
in the 77 coordinate system as r]a{p) = {l — a)ria{q) +Cirja{r). The second becomes apparent 
when realizing that the Taylor expansion of \ogp reads 

= ^ 9iXi + ^ 9i-j X^X-j + ^ 9ijk XiXjXk H h 6*1. .„ Xi..Xn - "0 = ^ SaX'' - ip 

i i<j i<j<k a^A 

where X° is the product of the components Xi^Xi^ '''^i^ € {0,1} when a = (ii, 72, .., ife). 
Thus, an e-geodesic is written, in the 9 coordinate system, simply as 0a{p) — {^~ct) 6a{q) + 
a9a{r). 
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Fisher metric, Kullback-Leibler divergence On this manifold A, there is a metric 
defined, the Fisher metric. In arbitrary coordinates Vi (it could be any of the Walsh, log, 
r/-, or ^-coordinates) , it reads 



dlogp dlogp 
dvi dvj 



Some intuition can be gained by realizing that, locally, the distance measured by the Fisher 
metric coincides with the distance measured by the Kullback-Leibler divergence:^ Consider 
a point p € A and a nearby point p + Sp. When we measure the squared length {Sp, Sp) of 
the variation 6p by the Kullback-Leibler divergence we find 



{Sp, 5p) = 



D{p:p + 5p) = E {\ogp - \og{p + 5p)} = E + ^| = E | 



p2 



Here, the 2nd-order approximation stems from the Taylor expansion of \og(j) 4- 5p) and 
E{(5p/p} = since 6p{x) = to preserve normalization. Note that, in this infinitesimal 
neighborhood, the Kullback-Leibler divergence becomes symmetric. Generalizing this to 
two small variations 5ip = d^^p := and 52P = dy-p := induced by small shifts along 
some coordinates we have 

dviP dy^p\ _ I dlogp dlogp 



dvi dvi 



and retrieve the Fisher metric. In turn, the Fisher metric can also be derived by considering 
the second order derivatives of the Kullback-Leibler divergence: 



5v=0 



Orthogonality of jy and 6, the Pythagoras The coordinate systems rj and 6 have a 
crucial property w.r.t. the Fisher metric — they are mutually orthogonal: At any point p in 
the manifold the variations induced by shifts along 9 and rj coordinates fulfill 

{9e^P, d,,^p) = Sab , 

where Sab is the Kronecker delta. Based on this one can derive a Pythagoras theorem: Let 
p, r and q be three distributions where the m-geodcsic connecting p and r is orthogonal to 
the e-gcodesic connecting r and q, then 

D{p:q) = D{p:r) + D{r : q) . 

Please figure 1 for an illustration. 



fc-cuts Let k denote an order of interactions that we are interested in. Then, the coordi- 
nates split into those describing interactions of order < k and those describing interactions 
of order > k, 

r]f. := (all r]a of order \a\ < k) , 

^ The Kullback-Leibler divergence -D (p : q) (also called relative entropy or divergence) is a measure for the 
loss of information (or gain of entropy) when a true distribution p is approximated by a model distributions 
q. For example, when p{x, y) is approximated by p(x)p{y) one looses information on the mutual dependence 
between x and y. Accordingly, the relative entropy D(p{x,y) : p{x)p{y)) is equal to the mutual information 
between x and y. Generally, when knowing the real distribution p one needs on average H{p) (entropy of 
p) bits to describe a random sample. If, however, we know only an approximate model q we would need 
on average H{p) + D{p : q) bits to describe a random sample of p. The loss of knowledge about the true 
distribution induces an increase of entropy and thereby an increase of average description length for random 
samples. 
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Figure 1: The Pythagoras in the case when a certain fc-cut is used to define the m- and e 
geodesies connecting to r, respectively r' . It holds: D{p : q) ~ D(j) : r) + D{r : q) and 
D{q:p) = D{q : r') + D {r' : p) . 

Ok' '■= (all 9a of order \a\ > k) . 

These can be mixed into a new coordinate system (77^., 6k*)- The point is that those dimen- 
sions spanned by rj^ are orthogonal to those spanned by 9k* ■ To simplify the discussion we 
call rik marginals (although they include marginals over fc-tuples of variables) and Ok* higher 
order interactions. Keeping the marginals rjj. constant defines m-flat sub-manifolds Mk{r]k), 
which are disjoint for different tj^. and cover all A. Keeping higher order interactions 6k* 
constant defines e-flat sub-manifolds Ek*{9k*), which are disjoint for different Ok* and cover 
aU A. 

Complete decomposition of different order interactions Given a distribution p, we 
define its fcth order reduction p^'^^ as the distribution with same marginals r/k (p) as p but 
vanishing higher order interactions Ok* ~ 0, 

= (r,,(p),0,. =0) . 

That is, p'^'^-' is the same distributions as p except that all interactions of order > k have been 
canceled. We call p^'^^ the fcth-order reduction of p. Given the Pythagoras it should be clear 
that p*^'^' can also be defined as the orthogonal projection of p onto the submanifold Ek* (0) 
or as the orthogonal projection of the uniform distribution p'"^ onto AlkiVkip))- please see 
figure 2 left, 

pf*^) = argmin D{p:q) = argmin D(^q : p'^^^) . 

9eBfc.(0) qeAhiVkiP)) 

Further, define Dk{p) = Dip'^^^ : p^*'"^)). Then the Pythagoras allows to decompose the 
mutual information I{p) in p (i.e., the measure of all interactions in p) into a sum of different 
order interactions: 

n 

I{p)^D{p:p^^^)^Y.^^^P) 

k=2 

Please see figure 2 right for an illustration. 

This result should be highlighted. The given formalism allows to explicitly distinguish dif- 
ferent order interactions between variables in a distribution and directly assigns coordinates 
to those different order interactions. The quantities Dk{p) = D(j)^^^ : p^'^"^)) measure 
precisely and only the fcth-order interactions in entropic units. 

For instance, consider three random variables Xi, X2, X3 which are pair- wise dependent in 
the sense I{Xi\Xj) ^ 0. The question is whether there exist "true" 3rd-order interactions 
or only concatenated 2nd-order interactions — in other terms, can they be described by a 
Markov process X\ X2 X^. The formalism gives an answer: if D^{p) = it is a 
Markov process, otherwise there exist 3rd-order interactions. 
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Mkip) 



I P 



I{p) = D(p:pW) 

Dip-.p'-"^) ^ 




Figure 2: The left figure illustrates a distribution p and its fcth-order reduction p*^*^^ : It is the 
orthogonal projection of p along Mk{p) onto Ek*{0)- The "distance" D{p : p^''^) measures 
"norm" of Ok*, i-e., it measures the amount of mutual information of order higher than k. 
The right figure illustrates the complete decomposition of p in reductions p^^^ of all orders. 
Every projection from p'*^' to p^*-'"^) is an orthogonal projection onto £'(fc_i).(0). Every 
"distance" D{p^^'^ : p^^''^'^) measures the mutual information specifically of order k. 



5 Geometric view on evolution operators 

Crossover In Evolutionary Algorithms, crossover is one means of mixing a parent pop- 
ulation to an offspring population. Populations can be formalized as distributions p and a 
definition of a simple form of crossover (uniform crossover parameterized with c G M) reads 

= (1 - c)p + cp^^^ . 

See, for instance, (Toussaint 2003) for a general definition of a crossover operator in more 
conventional notation and details of when it reduces to this simple form. 

This crossover simply mixes the original distribution (or population) p with its Ist-order 
reduction. The Ist-order reduction is the product of all single variable marginals, i.e., it is 
the distribution with the same marginals (gene frequencies) as p but all dependencies (gene 
linkages) between the variables eliminated. From the geometrical point of view, crossover 
makes a step along the m-geodesic connecting p and p^^^. It can be illustrated as a step 
along the projection onto the submanifold Ei-{p), please see figure 3. 

From this view it becomes clear that a reasonable coordinate system to describe crossover 
is {rji, Oi*). Crossover does not change rji (it operates orthogonally to 771) but continuously 
reduces the 61* variables. That 0i. are reduced and not increased is intuitive from figure 3 
(recall that 0's are always positive) and becomes apparent from that the "distance" from p 
to p^^-*, I{p) = D{p : p'^-*), is a norm of 61*. 

Max Entropy Wright, Poll, Stephens, Langdon, & Pulavarty (2004) recently proposed 
an evolutionary search scheme that constructs the new search distribution (offspring pop- 
ulation) via a maximum entropy principle: From the parent population all second order 
scheme frequencies are calculated. Then, from all the distributions which have the same 
second order schema frequencies, the new offspring distribution is the one with maximum 
entropy. 

In our formalism, constraining the schema frequencies corresponds to fixing r/j, i.e., con- 
straining the offspring distribution to the submanifold ^2(772). The distribution with maxi- 
mal entropy in A/2(r;2) must have minimal higher order (3rd-order or higher) interactions ©2* 
since interactions (mutual information) reduce entropy. Thus, the max entropy rule simply 
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Figure 3: Crossover is an operator that takes a step along the projection of p towards the 
first order reduction p^^\ 



amounts to setting 62* = 0, i.e., choosing p^^^ = (^2'0) the new offspring distribution. 

Again, this can be viewed geometrically as the orthogonal projection of the parent population 
p onto £^2* (0) according to 

argmin D{p : 

or as the orthogonal projection of the uniform distribution p*^"^ onto M2(J72) 
argmin D(^q : p'-"-') . 

This latter way of writing the max entropy principle is quite intuitive: find the distribution 
that fulfills the required constraints (lies on -A/2('72)) but is closest to the uniform distribution 

p(0). 

Eventually, note the strong analogy of the maximum entropy principle proposed by (Wright, 
Poll, Stephens, Langdon, & Pulavarty 2004) and the simple crossover operator given before: 
Crossover moves p toward p^-^"^ , while the search heuristic considered by Wright et. al. chooses 
p^^-* as the new search distribution. 



6 Discussion 



The methods information geometry provides to analyze and describe the structure of distri- 
butions are deeply grounded in information theory. For instance, it seems very beneficial to 
have coordinate systems for distributions which capture precisely arbitrary fcth order inter- 
actions between variables and have a direct link to measures like mutual information and 
the KuUback-Leibler divergence. Also the geometric aspects, e.g., that some operations can 
be described as orthogonal to certain submanifolds, add to a more comprehensive picture 
of the space of distributions. In that sense, information geometric methods enhance more 
common approaches in Evolutionary Computation, like the Walsh bases, in describing the 
structure of distributions and operators. 

However, the question remains how and whether these methods can be used to (1) actually 
propose new heuristic search algorithms or (2) to provide new theoretical tools to analyze 
the dynamics of evolutionary processes. 



9 



Acknowledgment 



I would like to thank the German Research Foundation (DFG) for their funding of the Emmy 
Noether fellowship TO 409/1-1. 

References 

Amari, S. (1999). Information geometry on hierarchical decomposition of stochastic interactions. 
Citeseer preprint. 

Amari, S. (2001). Information geometry on hierarchy of probability distributions. IEEE Trans- 
actions on Information Theory 47{5), 1701-1711. 

Chryssomalakos, C. & C. R. Stephens (2004). What basis for genetic dynamics? In 2004 Genetic 
and Evolutionary Computation Conference (CECCO 2004), PP- 1018-1029. Springer, Berlin. 

Khuri, S. (1994). Walsh and Haar functions in Genetic Algorithms. In Proceedings of the 1994 
ACM Symposium on Applied Computing, pp. 201-205. 

Langton, C. (1990). Cmputation at the edge of chaos: Phase transitions and emergent computa- 
tion. Physica D 42, 12. 

Sporns, O. & G. Tononi (2002). Classes of network connectivity and dynamics. Complexity 7, 
28-38. 

Toussaint, M. (2003). The structure of evolutionary exploration: On crossover, buildings blocks, 
and Estimation-Of-Distribution algorithms. In 2003 Genetic and Evolutionary Computation 
Conference (GECCO 2003), pp. 1444-1456. 

Toussaint, M. (2004). Non-trivial genotype-phenotype mappings and the evolution of represen- 
tations. Evolutionary Computation Journal. Submitted. 

Wright, A., R. Poh, C. Stephens, W. Langdon, & S. Pulavarty (2004). An Estimation of Distribu- 
tion Algorithm based on maximum entropy. In 2004 Genetic and Evolutionary Computation 
Conference (GECCO 2004), PP- 343-354. Springer, Berlin. 



